语言,视觉和多模式预审查的大量融合正在出现。在这项工作中,我们介绍了通用多模式基础模型BEIT-3,该模型BEIT-3,该模型在视觉和视觉任务上都实现了最新的转移性能。具体来说,我们从三个方面提出了大融合:骨干架构,预训练任务和模型扩展。我们介绍了多道路变压器进行通用建模,其中模块化体系结构可以实现深融合和模态特定的编码。基于共享的骨干,我们以统一的方式对图像(Imglish),文本(英语)和图像文本对(“平行句子”)进行蒙面的“语言”建模。实验结果表明,BEIT-3在对象检测(COCO),语义分割(ADE20K),图像分类(Imagenet),视觉推理(NLVR2),视觉询问答案(VQAV2),图像字幕上获得最先进的性能(可可)和跨模式检索(Flickr30k,可可)。
translated by 谷歌翻译
在计算机视觉和自然语言处理中,模型架构中的创新,提高模型容量的性能可靠地转化为性能增益。在与这种趋势的鲜明对比中,最先进的加强学习(RL)算法通常使用小的MLP,并且性能的增益通常来自算法创新。假设RL中的小型数据集需要简单的模型是很自然的,以避免过度装备;然而,这个假设是未经测试的。在本文中,我们调查RL代理商如何通过交换具有较大现代网络的小MLP,以跳过连接和标准化,专注于演员 - 评论家算法。我们经验验证,天真地采用这种架构导致不稳定和性能差,可能在实践中有助于简单模型的普及。但是,我们表明数据集大小不是限制因素,而是争辩说,不稳定性通过评论家占据渐变是罪魁祸首。我们证明光谱归一化(SN)可以减轻这个问题并使大型现代架构稳定训练。使用SN平滑后,较大的模型会产生显着的性能改进 - 表明除了算法创新外,通过专注于模型架构,可能拥有更多“简单”的收益。
translated by 谷歌翻译
Data-driven models such as neural networks are being applied more and more to safety-critical applications, such as the modeling and control of cyber-physical systems. Despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. In particular, when long-term predictions are needed or frequent measurements are not available, the open-loop stability of the model becomes important. However, it is difficult to make such guarantees for complex black-box models such as neural networks, and prior work has shown that model stability is indeed an issue. In this work, we consider an aluminum extraction process where measurements of the internal state of the reactor are time-consuming and expensive. We model the process using neural networks and investigate the role of including skip connections in the network architecture as well as using l1 regularization to induce sparse connection weights. We demonstrate that these measures can greatly improve both the accuracy and the stability of the models for datasets of varying sizes.
translated by 谷歌翻译
We test the performance of GAN models for lip-synchronization. For this, we reimplement LipGAN in Pytorch, train it on the dataset GRID and compare it to our own variation, L1WGAN-GP, adapted to the LipGAN architecture and also trained on GRID.
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.
translated by 谷歌翻译
As spatial audio is enjoying a surge in popularity, data-driven machine learning techniques that have been proven successful in other domains are increasingly used to process head-related transfer function measurements. However, these techniques require much data, whereas the existing datasets are ranging from tens to the low hundreds of datapoints. It therefore becomes attractive to combine multiple of these datasets, although they are measured under different conditions. In this paper, we first establish the common ground between a number of datasets, then we investigate potential pitfalls of mixing datasets. We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. Finally, we pinpoint the most relevant differences.
translated by 谷歌翻译
This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods.
translated by 谷歌翻译
Using 3D CNNs on high resolution medical volumes is very computationally demanding, especially for large datasets like the UK Biobank which aims to scan 100,000 subjects. Here we demonstrate that using 2D CNNs on a few 2D projections (representing mean and standard deviation across axial, sagittal and coronal slices) of the 3D volumes leads to reasonable test accuracy when predicting the age from brain volumes. Using our approach, one training epoch with 20,324 subjects takes 40 - 70 seconds using a single GPU, which is almost 100 times faster compared to a small 3D CNN. These results are important for researchers who do not have access to expensive GPU hardware for 3D CNNs.
translated by 谷歌翻译
未知的非线性动力学通常会限制前馈控制的跟踪性能。本文的目的是开发一个可以使用通用函数近似器来补偿这些未知非线性动力学的前馈控制框架。前馈控制器被参数化为基于物理模型和神经网络的平行组合,在该组合中,两者都共享相同的线性自回旋(AR)动力学。该参数化允许通过Sanathanan-Koerner(SK)迭代进行有效的输出误差优化。在每个Sk-itteration中,神经网络的输出在基于物理模型的子空间中通过基于正交投影的正则化受到惩罚,从而使神经网络仅捕获未建模的动力学,从而产生可解释的模型。
translated by 谷歌翻译